A Default First Order Family Weight Determination
نویسنده
چکیده
Weighted Probability Distribution Voting (WPDV) is a newly designed machine learning algorithm, for which research is currently aimed at the determination of good weighting schemes. This paper describes a simple yet eeective weight determination procedure, which leads to models that can produce competitive results for a number of NLP classiication tasks. 1 The WPDV algorithm Weighted Probability Distribution Voting (WPDV) is a supervised learning approach to classiication. A case which is to be classiied is represented as a feature-value pair set: An estimation of the probabilities of the various classes for the case in question is then based on the classes observed with similar feature-value pair sets in the training data. To be exact, the probability of class C for F case is estimated as a weighted sum over all possible subsets F sub of F case : ^ P (C) = N (C) X F sub Fcase W F sub f req(C j F sub) f req(F sub) with the frequencies (freq) measured on the training data, and N (C) a normalizing factor such that P ^ P (C) = 1. In principle, the weight factors W F sub can be assigned per individual subset. For the time being, however, they are assigned for groups of subsets. First of all, it is possible to restrict the subsets that are taken into account in the model, using the size of the subset (e.g. F sub contains at most 4 elements) and/or its frequency (e.g. F sub occurs at least twice in the training material). Subsets which do not fulll the chosen criteria are not used. For the subsets that are used, weight factors are not assigned per individual subset either, but rather per \family", where a family consists of those subsets which contain the same combination of feature types (i.e. the same f i). The two components of a WPDV model, distributions and weights, are determined separately. In this paper, I will use the term training set for the data on which the distributions are based and tuning set for the data on the basis of which the weights are selected. Whether these two sets should be disjunct or can coincide is one of the subjects under investigation. 2 Family weights The various family weighting schemes can be classiied according to the type of use they make of the tuning data. Here, I use a very rough classiication, into weighting …
منابع مشابه
An Application of Genetic Network Programming Model for Pricing of Basket Default Swaps (BDS)
The credit derivatives market has experienced remarkable growth over the past decade. As such, there is a growing interest in tools for pricing of the most prominent credit derivative, the credit default swap (CDS). In this paper, we propose a heuristic algorithm for pricing of basket default swaps (BDS). For this purpose, genetic network programming (GNP), which is one of the recent evolutiona...
متن کاملA Default First Order Family Weight Determination Procedure for WPDV Models
Weighted Probability Distribution Voting (WPDV) is a newly designed machine learning algorithm, for which research is currently aimed at the determination of good weighting schemes. This paper describes a simple yet effective weight determination procedure, which leads to models that can produce competitive results for a number of NLP classification tasks. 1 T h e W P D V a l g o r i t h m Weig...
متن کاملImplication of an Integrated Approach to the Determination of Water Saturation in a Carbonate Gas Reservoir Located in the Persian Gulf
Water saturation determination is one of the most important tasks in reservoir studies to predict oil and gas in place needed to be calculated with more accuracy. The estimation of this important reservoir parameter is commonly determined by various well logs data and by applying some correlations that may not be so accurate in some real practical cases, especially for carbonate reservoirs. Sin...
متن کاملLocation Reparameterization and Default Priors for Statistical Analysis
This paper develops default priors for Bayesian analysis that reproduce familiar frequentist and Bayesian analyses for models that are exponential or location. For the vector parameter case there is an information adjustment that avoids the Bayesian marginalization paradoxes and properly targets the prior on the parameter of interest thus adjusting for any complicating nonlinearity the details ...
متن کاملFirst-Order Default Logic
We propose a model theory for full first-order default logic that allows both closed and non-closed default theories. Beginning with first-order languages without logical equality, we note how Henkin’s proof of the completeness theorem for first-order logic yields complete algebras; that is, algebras over which models of consistent theories may always be found. The uniformity is what is interes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000